Strategy, Leadership, and Narratives: Data Viz for Policy Change

Jared E. Knowles

May 7, 2014

The First Directive

Know thy data, but know thy audience better.

  • What is their question?
  • What is their timeframe?
  • What are their constraints?
  • What is their capacity?

The Struggle

technologies

Yabbut - too much data

data

Yabbut - too many priorities

too many

The Way Out

  1. Identify your goals

  2. Explore your data and focus

  3. Focus even narrower

  4. Find the context

  5. Put it together

What do you want to say?

Wisconsin FRL Map

Model fit

EWS Plots

Of Metrics and Dashboards

Widgets provide limited utility.

bad graph

Identify your goals

What is the goal of this graphic?

wordcloud

What did we learn?

  • The words obviously and education were used a lot
  • Some other words were frequent too
  • what value have we provided?

Goals need not mean complex graphics

plot of chunk unnamed-chunk-1

What do we learn?

  • More power = faster times
  • Clear outliers are identified to enable discussion
  • Reference lines are provided to give audience orientation
  • A smoother is applied to show the general trend

plot of chunk unnamed-chunk-2

Quote

How you turn dimensions in the data into visual cues for your audience is everything.

Balance is hard

Raw Data

Plot the data as is - easy to explain, does not scale

Summarize

Calculate statistics on the data and plot, meaningful, may obscure important detail

Model

Model the data, analysis of specific questions is clear, harder to explain

Test

Term: Raw data - Plot the points as is - Pros: easy to explain and interpret - Cons: does not scale well, insights not obvious

Term: Summarize - Summarize the data and plot summaries - Pros: Meaningful summaries are efficient - Cons: More complex, summaries may skew important features

Term: Model - Model the data and show simulations or projections - Pros: Forces analysis to answer questions - Cons: Complex, difficult to convey results

How do we choose?

Identify our goals and make a lot of graphs.

Exploratory vs. Explanatory

Exploratory graphics are what we use to understand the data. They are useful for us to understand what is going on and where the key features of a dataset are.

Explanatory graphics are polished and annotated graphics that provide the viewer with information. Sometimes they are self-contained.

Exploratory

explore student scores

Explanatory

growth charts

Exploratory

actandstudents1

Summary

actandstudents

Models

EWS Plots

Don’t Underestimate Your Audience

  • 1 polished detailed graph is better than 10 throwaway graphs
  • Provide context and remember your limits
  • Link your graphic to things leadership cares about

Context as Key

  • Education graphics are littered with plots without context
  • Context is what creates urgency, helps focus decision making, and allows tradeoffs to be balanced

Attendance example

attendsummary <- data.frame(year = 2008:2013, 
                            att = c(97.7, 93.5, 92.6, 96.2, 96.4, 96.7), 
                            sd = c(0.8, 1.2, 0.9, 0.95, 0.96, 1.2), 
                            count = c(500, 460, 480, 490, 492, 460))
qplot(factor(year), att, data = attendsummary, geom = 'bar', stat = 'identity') + 
  labs(x = "Year", y = "Att. Rate", title = "Attendance Rate in School Over Time") + 
  theme_dpi()

plot of chunk unnamed-chunk-3

Iterate

qplot(factor(year), att, data = attendsummary, geom = 'bar', stat = 'identity', 
      color = I("gray30"), fill = I("gray80")) + 
  labs(x = "Year", y = "Att. Rate", title = "Attendance Rate in School Over Time") + 
  theme_dpi() + geom_hline(yintercept = 95, linetype = 2, color = I("red"))

plot of chunk unnamed-chunk-4

Not bad

ggplot(attendsummary, aes(x = year, y = att, ymin = att - 1.8*sd, 
                          ymax = att + 1.8*sd)) + 
  geom_bar(stat = "identity", color = I("gray30"), fill = I("gray80")) +
  geom_errorbar(width =0.4, color = I("red")) + ylim(c(0, 100)) + 
  geom_hline(yintercept = 95, linetype = 2, color = I("red"))

plot of chunk unnamed-chunk-5

Context

  • Moves the conversation forward
  • Focuses us on the issue at hand
  • Reduces complexity without throwing away data

Simulation

Counterfactual Modeling

Talk Outline

  • Problem Statement
  • Lots of data, lots of demand
  • Limited time, space, and opportunity
  • Role of Leadership
  • Be strategic in what you present
  • Lead your audience – analysis, not data
  • Tools
  • Context
  • Counterfactuals
  • Simulation
  • Practical

Slide with R Code and Output | A subtitle

This text is red
summary(cars)
     speed           dist    
 Min.   : 4.0   Min.   :  2  
 1st Qu.:12.0   1st Qu.: 26  
 Median :15.0   Median : 36  
 Mean   :15.4   Mean   : 43  
 3rd Qu.:19.0   3rd Qu.: 56  
 Max.   :25.0   Max.   :120  

Slide with Plot

plot of chunk unnamed-chunk-7

A Simple Table

Demonstration of simple table syntax.
Right L eft Center D efault
12 12 12 12
123 123 123 123
1 1 1 1

A multiline table

Here’s the caption. It, too, may span multiple lines.
Centered Header Default Aligned Right Aligned Left Aligned
First row 12.0 Example of a row that spans multiple lines.
Second row 5.0 Here’s another one. Note the blank line between rows.

R Markdown

Opinions about data visualization are everywhere and quick search online reveals thousands of resources to learn how to draw bar graphs and when a scatter plot might be effective. While getting the technical details right is important, it is merely a necessary, but not sufficient, condition for driving change. In order to effectively use data in a decision making context, analysts must focus on knowing their audience, knowing the decision space, and leading their audience toward constructive engagement with the data. Using examples from senior leadership discussions at the Wisconsin Department of Public Instruction, this talk will explore how to build capacity in an audience and empower them to make decisions informed by analytics. A particular focus will be taken on the deliberate design choices an analyst must confront in order construct visualizations that are accessible and invite discussion. This includes strategies such as simulation, counterfactual modeling, and selection of context cues to bring the data into a familiar frame for the audience. The session will conclude with a brief discussion of practical advice on technologies, formats, and presentation techniques for different audience types.

Asset

library(png)
library(grid)
img <- readPNG("assets/ewsLITplot.png")
grid.raster(img)

EWS Model Fit Plots